k -Mismatch with Don't Cares
نویسندگان
چکیده
We give the first non-trivial algorithms for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an O(n(k + log n log log n) logm) time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk log m) time solution that uses tools developed for group testing and finally an approach based on k-selectors that runs in O(nk polylogm) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost.
منابع مشابه
A Filtering Algorithm for k -Mismatch with Don't Cares
We present a filtering based algorithm for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Θ(nmk logm) time.
متن کاملOn pattern matching with k mismatches and few don't cares
We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorith...
متن کاملPattern matching with don't cares and few errors
We present solutions for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an Θ (n(k + logm log k) log n) time randomised algorithm which finds the correct answer with high probability....
متن کاملApproximate String Matching with Variable Length Don ' t Care
Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...
متن کاملMiltersen. Lower Bounds for Union-split--nd Related Problems on Random Access Machines. In
10 to instances with fewer don't cares. The hardest case seems to be when queries have exactly d 2 don't cares. In this case npm is extremely rich. Almost all entries in the communication matrix are one. However, perhaps not surprisingly, we have the following: 11 Claim 16. The communication matrix for npm restricted to queries with exactly d 2 don't cares contains a 1-monochromatic rectangle o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007